Android上层WatchDog学习笔记 您所在的位置:网站首页 android watch dog Android上层WatchDog学习笔记

Android上层WatchDog学习笔记

2024-07-08 14:38| 来源: 网络整理| 查看: 265

一、简述

1. 了解 WatchDog 的原理,可以更好的理解系统服务的运行机制。

二、WatchDog实现

1. 代码实现位置

//frameworks/base/services/core/java/com/android/server/Watchdog.java public class Watchdog extends Thread { ... }

可见 Watchdog 是一个线程。

2. WatchDog 在 SystemServer.java 中启动

run() //SystemServer.java startBootstrapServices() //SystemServer.java traceBeginAndSlog("StartWatchdog"); final Watchdog watchdog = Watchdog.getInstance(); watchdog.start(); traceEnd(); ... traceBeginAndSlog("InitWatchdog"); watchdog.init(mSystemContext, mActivityManagerService); traceEnd();

可见 Watchdog 是运行在 SystemServer 中的一个辅线程。因为是线程,所以,只要start即可。

3. WatchDog构造方法

private Watchdog() { super("watchdog"); // not checking the background thread,shared foreground thread is the main checker. 线程名 "android.fg" mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChecker); // Add checker for main thread. only do a quick check since there can be UI running on the thread. mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT)); // Add checker for shared UI thread. 线程名 "android.ui" mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT)); // And also check IO thread. 线程名 "android.io" mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT)); // And the display thread. 线程名 "android.display" mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT)); // And the animation thread. 线程名 "android.anim" mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(), "animation thread", DEFAULT_TIMEOUT)); // And the surface animation thread. 线程名 "android.anim.lf" mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(), "surface animation thread", DEFAULT_TIMEOUT)); // Initialize monitor for Binder threads. addMonitor(new BinderThreadMonitor()); mOpenFdMonitor = OpenFdMonitor.create(); HandlerThread handlerThread = new HandlerThread("workThread"); //SS下的"workThread"线程 handlerThread.start(); mWorkHandler = new Handler(handlerThread.getLooper()) { @Override public void handleMessage(Message msg) { switch (msg.what) { case MESSAGE_AFE_CHECK_ERROR: checkAfeStatus(false); break; case MESSAGE_AFE_CHECK_OVER: Slog.i(TAG, "release observer"); mFileObserver.stopWatching(); mFileObserver = null; checkAfeStatus(true); getLooper().quitSafely(); mWorkHandler = null; break; } } }; // See the notes on DEFAULT_TIMEOUT. assert DB || DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; }

重点关注两个对象:mMonitorChecker 和 mHandlerCheckers。

其中 mHandlerCheckers 列表元素的来源:

(1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入

(2)外部导入:Watchdog.getInstance().addThread(handler);

mMonitorChecker 列表元素的来源:

(1) 外部导入:Watchdog.getInstance().addMonitor(monitor);

(2) 特别说明:addMonitor(new BinderThreadMonitor());

3. WatchDog的run()方法

public void run() { while (true) { ... synchronized (this) { for (int i=0; i 0)) { mCompleted = true; return; } if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); }

mMonitors.size() == 0 的情況,主要为了检查 mHandlerCheckers 中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling().

mMonitorChecker 对象的列表元素一定是大于0,此时,关注点在 mHandler.postAtFrontOfQueue(this):

5. HandlerChecker 的 run()

public final class HandlerChecker implements Runnable { ... @Override public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); } synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } } ... }

运用的手段,监听 monitor 方法。

(1) 这里是对 mMonitors 进行 monitor,而能够满足条件的只有:mMonitorChecker,例如,各种服务通过 addMonitor 加入列表。

Watchdog.getInstance().addMonitor(this); //ActivityManagerService.java Watchdog.getInstance().addMonitor(this); //InputManagerService.java Watchdog.getInstance().addMonitor(this); //PowerManagerService.java Watchdog.getInstance().addMonitor(this); //WindowManagerService.java

而被执行的 monitor 方法很简单,例如 ActivityManagerService 的:

public void monitor() { synchronized (this) { } }

这里仅仅是检查系统服务是否长时间被锁住。

(2) 特别说明,检查 BinderThreadMonitor 方法

private static final class BinderThreadMonitor implements Watchdog.Monitor { @Override public void monitor() { Binder.blockUntilThreadAvailable(); } } //frameworks/base/core/java/android/os/Binder.java public static final native void blockUntilThreadAvailable(); //frameworks/native/libs/binder/IPCThreadState.cpp void IPCThreadState::blockUntilThreadAvailable() { pthread_mutex_lock(&mProcess->mThreadCountLock); while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) { ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n", static_cast(mProcess->mExecutingThreadsCount), static_cast(mProcess->mMaxThreads)); pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock); } pthread_mutex_unlock(&mProcess->mThreadCountLock); }

这里仅仅是检查进程中包含的可执行线程的数量不能超过 mMaxThreads,如果超过了最大值(31个),就需要等待。默认每个进程最大15个binder线程,但是SS将自己的改成31个了:

//frameworks/native/libs/binder/ProcessState.cpp #define DEFAULT_MAX_BINDER_THREADS 15 //frameworks/base/services/java/com/android/server/SystemServer.java public final class SystemServer { private static final int sMaxBinderThreads = 31; private void run() { BinderInternal.setMaxThreads(sMaxBinderThreads); //在启动所有服务之前就设置了 ... startBootstrapServices(); ] }

6. 超时后WatchDog会做什么

private void checkAfeStatus(boolean success) { public void run() { ... Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "*** GOODBYE!"); Process.killProcess(Process.myPid()); System.exit(10); }

kill自己所在进程(system_server),并退出。

三、WatchDog日志打印

1. process stack traces

保存路径由 dalvik.vm.stack-trace-file 或 dalvik.vm.stack-trace-dir 控制,常规为 /data/anr 。调用 ActivityManagerService.dumpStackTraces() 进行打印。

public final class HandlerChecker implements Runnable { //Watchdog.java public void run() { while (true) { if (!fdLimitTriggered) { if (waitState == WAITED_HALF) { if (!waitedHalf) { Slog.i(TAG, "WAITED_HALF"); // We've waited half the deadlock-detection interval. Pull a stack // trace and wait another half. ArrayList pids = new ArrayList(); pids.add(Process.myPid()); ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } } final File stack = ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } }

注意,堵塞一半时即 WAITED_HALF,也会打印 process stack traces。

2. slog

Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); Slog.w(TAG, "*** GOODBYE!");

3. event log

EventLog.writeEvent(EventLogTags.WATCHDOG, subject);

4. kernel stack traces

// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq('w'); doSysRq('l');

触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯,打印到内核log中。

5. dropbox

Thread dropboxThread = new Thread("watchdogWriteToDropbox") { public void run() { // If a watched thread hangs before init() is called, we don't have a // valid mActivity. So we can't log the error to dropbox. if (mActivity != null) { mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, null, subject, null, stack, null); } StatsLog.write(StatsLog.SYSTEM_SERVER_WATCHDOG_OCCURRED, subject); } }; dropboxThread.start();

注意,dropbox 一般放在 /data/system/dropbox 目录下,指定目录的位置是:

//frameworks/base/services/core/java/com/android/server/DropBoxManagerService.java public DropBoxManagerService(final Context context) { this(context, new File("/data/system/dropbox"), FgThread.get().getLooper()); }

 

四、监测UiThread、IoThread、DisplatyThread、FgThread的原因

1. 这4个类,继承 ServiceThread,是单例模式。例如 UiThread.java

//frameworks/base/services/core/java/com/android/server/UiThread.java public final class UiThread extends ServiceThread { private UiThread() { super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/); } @Override public void run() { // Make sure UiThread is in the fg stune boost group Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP); super.run(); } private static void ensureThreadLocked() { if (sInstance == null) { sInstance = new UiThread(); sInstance.start(); final Looper looper = sInstance.getLooper(); looper.setTraceTag(Trace.TRACE_TAG_SYSTEM_SERVER); looper.setSlowLogThresholdMs(SLOW_DISPATCH_THRESHOLD_MS, SLOW_DELIVERY_THRESHOLD_MS); sHandler = new Handler(sInstance.getLooper()); } } public static UiThread get() { synchronized (UiThread.class) { ensureThreadLocked(); return sInstance; } } public static Handler getHandler() { synchronized (UiThread.class) { ensureThreadLocked(); return sHandler; } } }

(1) 通过 get() 获取对象。

(2) 通过 getHandler() 获取各自线程里面的 Handler 对象。

(3) 注意看,创建自身对象 ensureThreadLocked 的时候,就进行了 start 动作。也就是说,这个线程。在创建对象的时候就,就已经启动了。

其次,这四个类都继承 ServiceThread ,而 ServiceThread 继承 HandlerThread。我们重点关注线程中的 Handler,因为 AMS、WMS、PMS 等系统服务都涉及调用它们。

//frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java final class UiHandler extends Handler { public UiHandler() { super(com.android.server.UiThread.get().getLooper(), null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { case SHOW_ERROR_UI_MSG: case SHOW_NOT_RESPONDING_UI_MSG: case SHOW_STRICT_MODE_VIOLATION_UI_MSG: case WAIT_FOR_DEBUGGER_UI_MSG: case DISPATCH_PROCESSES_CHANGED_UI_MSG: case DISPATCH_PROCESS_DIED_UI_MSG: case DISPATCH_UIDS_CHANGED_UI_MSG: case DISPATCH_OOM_ADJ_OBSERVER_MSG: } } }

UiHandler 是直接获取的 UiThread 里面的 Looper。我们清楚一个线程一个 Looper,一个 MessageQueue,但是可以有多个 Handler.

我们看 handleMessage 里面的处理方式,说明并不一定是主线程才能更新Ui。(但是Android有说明必须主线程才能更新UI)。

2. 使用的场景差异

UiThread --> ActivityManagerService DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService IoThread --> PackageInstallerService、StorageManagerService、BluetoothManagerService

 

五、总结

1. Watchdog 的核心对象为 mHandlerCheckers 和 mMonitorChecker。

mHandlerCheckers:监控消息队列是否发生阻塞。

mMonitorChecker:监控系统核心服务是否发生长时间持锁。

mHandlerCheckers 的对象采用手段为通过 mHandler.getLooper().getQueue().isPolling() 判断是否超时;mMonitorChecker 通过 synchronized(this) 判断是否超时,其中特别注意,BinderThreadMonitor 主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。

2. 超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析。

3. 超时之后,Watchdog会杀掉自己的进程,也就是此时 system_server 进程的pid会变化。

 

 

 

参考:android原理分析博客,Android WatchDog原理分析:https://blog.csdn.net/weixin_28543661/article/details/117344345

 



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有